On this page

Skip to content

A Brief Introduction to Git Data Structures

All data stored by Git resides within the .git folder. Deleting the .git folder is equivalent to deleting the local version control for that repository. Below are the main contents of the .git folder.

Folders

hooks

Stores various custom scripts that can be automatically executed at specific moments during Git operations, such as before or after commit, push, or merge. These scripts can be used to perform automated tests, check code style, and more. Common hooks include pre-commit, pre-push, and post-merge.

info

Stores auxiliary information files. By default, there is an "exclude" file used to define rules for excluding specific files or directories. It serves the same purpose as .gitignore, but "exclude" is a local setting applicable to an individual developer's environment. For team development, you should use .gitignore and track it in version control.

logs

Records the update history of references (such as branches and HEAD). These logs can be used to track who made what changes to a branch and when. Common contents include:

  • HEAD: Records the history of every HEAD change.
  • refs\heads: Stores the change history of each branch.
  • \refs\remotes\origin: Stores git fetch and git push records for remote branches. "origin" is the default alias assigned by the local repository for the linked remote repository, though other names can be created as needed.

TIP

  • The Git command git reflog displays the contents of "logs/HEAD". If you have deleted commit records using git reset --hard or git rebase -i, you can use git reflog to find the operation history and then use git reset --hard to restore the commit.

objects

  • Purpose: Stores all Git data objects, including blob, tree, commit, and tag objects.
  • Structure: Uses the first two characters of the object's SHA-1 hash as the directory name, and the remaining 38 characters as the filename. For example, an object with the SHA-1 hash d670460b4b4aece5915caf5c68d12f560a9fe3e4 will be stored at .git/objects/d6/70460b4b4aece5915caf5c68d12f560a9fe3e4.

Object Generation During Commit Operations

Assuming you commit changes to a file, three objects are generated:

  • Blob Object: Stores the actual content of the file. For example, newly added or modified file content is created and stored as a blob object.

  • Tree Object: Stores the directory structure and the SHA-1 hashes of the blob objects for all files in that directory, describing the tree structure of files and subdirectories.

  • Commit Object: Stores commit information, including the SHA-1 hash of the tree object, the SHA-1 hash of the previous commit, the commit message, and author/committer information. The HASH value seen in version control systems refers to this object.

For more detailed information on file content, please refer to the official documentation: "Git Internals - Git Objects".

refs

The filenames are branch or tag names, and the content is the HASH value of the currently corresponding commit. Common folders include:

  • heads: Stores local branches. If a branch name contains "/", a corresponding directory structure is created. For example, for "feature/requirement1", a "feature" folder is created, containing the "requirement1" file.
  • remotes: Stores remote branches, using the remote repository name as the folder, such as "origin".
  • tags: Stores the names of tags.

Files

COMMIT_EDITMSG

Records the content of the last commit. If you use git commit (without -m), git commit --amend, or edit a message during a conflict resolution process, this file will be opened for editing. Some GUI tools may provide a UI for editing instead of opening this file when executing these commands.

TIP

The content brought into git commit --amend is not related to this file; rather, the content of the previous commit is written into this file for editing.

config

Stores Git settings for the repository. This file is similar to .gitconfig, but it is primarily for repository-specific settings.

description

Used by the Git Web GUI to read the repository's description information.

index

A binary file consisting of the repository file snapshot after the latest commit and the information of files added via git add.

Stores the name of the currently checked-out branch or a specific commit. When the current HEAD points to a branch (e.g., main), it displays ref: refs/heads/main; when HEAD points to a specific commit, it stores the HASH value of that commit.

ORIG_HEAD

Stores the state of HEAD before performing destructive operations (such as git reset, git merge, etc.), used to restore to the previous state if necessary.

FETCH_HEAD

Marks the record of the last git fetch for each branch. The format of each line is as follows:

text
{Commit SHA-1} [not-for-merge] branch '{branch name}' of {remote repository URL}

Example:

text
3b3a827b86d264f9c81bc77ef6e0e3df5e302ae8 not-for-merge branch 'main' of http://127.0.0.1/wing/Project

[not-for-merge]: Indicates that this node is not currently merged into the current branch. git pull is actually git fetch + git merge; if a merge behavior is triggered, this tag will not be included.

A Brief Discussion on Branches

From the Git structure described above, it is clear that branches and tags are simply objects pointing to specific commits. A tag points to a fixed commit object, while a branch updates with every commit. The branch graph starts from the commit object the branch points to, traces back to the previous commit object recorded, and eventually produces the complete commit history structure.

The above was just to review the content from the Git course I took with Po-Ge a few years ago; what follows is the main part where I start rambling.

From the perspective of fantasy novels, the branch graph is like a known timeline, branches represent the current nodes, and tags are fixed historical coordinates. After each commit, if you use git reset or git rebase, the nodes before restoration are still stored in the "objects" folder. The commit nodes in the branch graph are the determined past, while the nodes before git reset are the possible futures. HEAD represents the current location in space-time. To travel through time, you can only see historical coordinates (tags) and known timelines (branch graphs); everything else must be queried using git reflog.

Change History

  • 2024-07-31 Initial version created.
  • 2024-09-20 Removed descriptions regarding version control for ".gitconfig" in the root directory, as it does not take effect.